-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integration for caching embeddings #27
Conversation
@GabrieleGhisleni to fix the Python 3.8 issues we need to |
@GabrieleGhisleni Thank you for contributing! Let me know when this is ready for review, happy to take a look. |
@maxjakob Thank you for your willingness to review! The changes are not quite ready yet, but I'll let you know as soon as they are. I appreciate your patience! |
@maxjakob The pull request should be ready to be reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job so far! I'm mainly wondering if there is a hard requirement to store the vectors as lists of floats or if we can store them as bytes and integrate a little nicer with how other caches work?
libs/elasticsearch/README.md
Outdated
|
||
Caching embeddings is obtained by using the [CacheBackedEmbeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings), | ||
in a slightly different way than the official documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add something on how (and maybe why) it is different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In a different way" because the documentation always mentions only the from_bytes_store method, whereas we show that it can be instantiated directly as an argument of CacheBackedEmbeddings.
if self._metadata is not None: | ||
body["metadata"] = self._metadata | ||
if self._store_input: | ||
body["text_input"] = text_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you want to store these so you can inspect the cache and learn something about usage statistics? Wondering if this is the right way to do this (or rather collect stats differently), but I'm not opposed to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core idea behind storing both the text and embeddings together is to ensure that we have the possibility to reconstruct the model or utilize them in other ways in the future. It is just to preserve valuable information for potential future use cases.
@GabrieleGhisleni no rush on this at all from my side, but feel free to tag me once the merge conflicts are resolved and this is ready for review again :) |
@maxjakob great! the conflicts should be fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvements! I left some more comments for cosmetic changes.
@maxjakob I should have fix the last suggestions. Let me know if it seems ok to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @GabrieleGhisleni! Thank you for your contribution.
I will release this tomorrow. |
…22612) The package for LangChain integrations with Elasticsearch https://github.com/langchain-ai/langchain-elastic contains a Elasticsearch byte store cache integration (see langchain-ai/langchain-elastic#27). This is the documentation contribution on the page dedicated to stores integrations Co-authored-by: Gabriele Ghisleni <[email protected]>
…22612) The package for LangChain integrations with Elasticsearch https://github.com/langchain-ai/langchain-elastic contains a Elasticsearch byte store cache integration (see langchain-ai/langchain-elastic#27). This is the documentation contribution on the page dedicated to stores integrations Co-authored-by: Gabriele Ghisleni <[email protected]>
Hi all, following our recent discussion in Issue #19334, I have opened this pull request to specifically adds the caching backend for embeddings.